首页> 外文OA文献 >Online Meta-learning by Parallel Algorithm Competition
【2h】

Online Meta-learning by Parallel Algorithm Competition

机译:并行算法竞争在线元学习

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

The efficiency of reinforcement learning algorithms depends critically on afew meta-parameters that modulates the learning updates and the trade-offbetween exploration and exploitation. The adaptation of the meta-parameters isan open question in reinforcement learning, which arguably has become more ofan issue recently with the success of deep reinforcement learning inhigh-dimensional state spaces. The long learning times in domains such as Atari2600 video games makes it not feasible to perform comprehensive searches ofappropriate meta-parameter values. We propose the Online Meta-learning byParallel Algorithm Competition (OMPAC) method. In the OMPAC method, severalinstances of a reinforcement learning algorithm are run in parallel with smalldifferences in the initial values of the meta-parameters. After a fixed numberof episodes, the instances are selected based on their performance in the taskat hand. Before continuing the learning, Gaussian noise is added to themeta-parameters with a predefined probability. We validate the OMPAC method byimproving the state-of-the-art results in stochastic SZ-Tetris and in standardTetris with a smaller, 10$\times$10, board, by 31% and 84%, respectively, andby improving the results for deep Sarsa($\lambda$) agents in three Atari 2600games by 62% or more. The experiments also show the ability of the OMPAC methodto adapt the meta-parameters according to the learning progress in differenttasks.
机译:强化学习算法的效率主要取决于少量的元参数,这些元参数会调制学习更新以及探索与开发之间的取舍。元参数的适应是强化学习中的一个悬而未决的问题,随着高维状态空间中深度强化学习的成功,近来,这无疑已成为一个更大的问题。 Atari2600电子游戏等领域的学习时间较长,因此无法对合适的元参数值进行全面搜索。我们提出了基于并行算法竞赛的在线元学习(OMPAC)方法。在OMPAC方法中,强化学习算法的多个实例与元参数的初始值的微小差异并行运行。在固定次数的情节之后,根据实例在任务中的表现来选择实例。在继续学习之前,将高斯噪声以预定的概率添加到主题参数中。通过将随机SZ-Tetris和标准Tetris的较小结果(分别为10 $ \ x $ 10)的开发板改进31%和84%,并通过改进深层分析的结果,我们验证了OMPAC方法Sarsa($ \ lambda $)代理商在三场Atari 2600游戏中增长了62%或更多。实验还证明了OMPAC方法能够根据不同任务的学习进度来适应元参数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号